Francisco J. Ballesteros Fabio Kon Roy H. Campbell
nemo@gsyc.inf.uc3m.es {f-kon,roy}@cs.uiuc.edu
Department of Computer Science
University of Illinois at Urbana-Champaign
Report No. UIUCDCS-R-97-2035, UILU-ENG-97-1748
August, 1997
The Off++ distributed adaptable kernel is a minimal
kernel whose only
task is to safely multiplex and export the distributed hardware
present in the network. It is designed to be used as a basis for
distributed user-level OS services. This technical report describes
the design and implementation of Off++, the object-oriented
redesign of the Off
kernel
[2, 3]. Off++ extends
the functionalities provided by Off in order to provide basic
support for the 2K operating system [5] we
are currently building. This document is meant to be the starting
point for the system literate implementation.
The well known definition for operating system is ``the software that securely abstracts and multiplexes physical resources'' [19]. By no means is it known that those resources should be contained in a single node. So, Why are our distributed operating systems based on microkernels which essentially multiplex just local resources?
Obviously system services can be later distributed when using a (centralized) microkernel. Indeed, that can be done even when using a monolithic system [16]. But this will not solve the actual problem that the system is not being actually distributed and is not transparently multiplexing both local and remote resources.
Secondly, a major drawback of current distributed operating systems is their lack of adaptability. It is known that adaptability can be achieved using a minimal microkernel as a foundation for the operating system [8, 7, 4, 10]. If the microkernel is centralized, adaptation of system services for particular requirements may harm the distribution of those services because they are distributed on top of the microkernel and this distribution is not supported by the microkernel itself. The reason is that user extensions may fail to preserve properties found in the distributed hardware and there will be no lower layer supporting them. If the microkernel is itself distributed, simple system extensions may still benefit from system distribution because the lower layer is distributed by itself. This problem can be noticed by the fact that it is necessary to modify and/or re-implement existing system services to add new distributed services to a typical microkernel based distributed system [6, 20, 13].
The 2K Operating System is being built to explore the combination of distribution and adaptability. 2K is built from two main pieces:
We expect the combination of both elements to yield a simple easy-to-use distributed and flexible OS.
By adopting a strict architecture self-awareness philosophy, we will allow system and user modules to be conscious of its own physical and logical architecture. So, the system will be able to adapt itself optimizing its performance and reliability.
This document describes the object oriented redesign of the Off kernel
[2, 3, 1], named
Off++, and is meant to be the starting point for its literate
implementation (see section 1.3).
Off++ also adds new functionalities to the kernel in order to provide
basic support for 2K. These new functionalities include
support for architecture self-awareness, system object browsing and
inspection (see subsections 2.3.5 and
2.3.7), and flexible user-defined domains (see
section 2.6).
Multiple protection domains, multi-user operation, and multitasking are optional features which can be avoided in those nodes where they are not useful.
The three basic abstractions provided by Off (shuttles, portals and DTLBs) will be described briefly in this section so that the reader will be able to better understand the following chapters.
Process services are quite simple in Off. The only abstraction provided is the Shuttle. A shuttle is an extensible hardware context. Initially, it consists only of a program counter and a stack pointer, though they can be safely extended later on to include other pieces of context or properties (such like general purpose registers, address spaces, privilege levels, etc.)
The basic interprocess communication mechanism provided by Off is the Portal. A Portal can be seen like a ``distributed interrupt line'' or a ``network-wide gate''. Portals have the basic utility of interrupts, i.e. they can be invoked transparently to make a handler perform some task.
Portals do not implement buffering and do not specify whether synchronous messages, asynchronous messages or RPCs are to be used. The mechanisms and policies used to locate and to invoke portals over the network are left up to the user. User provided transport and location protocols are used to extend the portal mechanism transparently over the network. In this way traps, interrupts, exception and user messages can be accessed and adapted in the whole system.
The only thing done by Off is to export the network hardware, and this also holds for memory management. Off implements a Distributed Software TLB. The user can establish translations from virtual to distributed physical memory addresses and the address translation hardware is safely multiplexed by Off among the competing applications.
As this document is hardcopy of a literate program [14] we
think that it is worth saying something about this technique. We do it by
citing the comp.programming.literate
newsgroup FAQ:
``Literate programming is the combination of documentation and source together in a fashion suited for reading by human beings. In fact, literate programs should be enjoyable reading, even inviting! [...] In general, literate programs combine source and documentation in a single file. Literate programming tools then parse the file to produce either readable documentation or compilable source. The WEB style of literate programming was created by D.E. Knuth during the development of his TeX typsetting software.All the original work revolves around a particular literate programming tool called WEB. Knuth says:
The philosophy behind WEB is that an experienced system programmer, who wants to provide the best possible documentation of his or her software products, needs two things simultaneously: a language like TeX for formatting, and a language like C for programming. Neither type of language can provide the best documentation by itself; but when both are appropriately combined, we obtain a system that is much more useful than either language separately.The structure of a software program may be thought of as a web that is made up of many interconnected pieces. To document such a program we want to explain each individual part of the web and how it relates to its neighbours. The typographic tools provided by TeX give us an opportunity to explain the local structure of each part by making that structure visible, and the programming tools provided by languages such as C or Fortran make it possible for us to specify the algorithms formally and unambigously. By combining the two, we can develop a style of programming that maximizes our ability to perceive the structure of a complex piece of software, and at the same time the documented programs can be mechanically translated into a working software system that matches the documentation.''
What follows have been written in a mixture of LaTeX and \textsl{C++}. The \textsl{noweb} literate programming tool has been used to obtain a LaTeX formatted version of the document as well as the C++ code which can be later compiled.
In what follows, each description of a part of the system may be followed by a chunk of code like this one
<Example chunk of code. >= (U->) class A_Class { ... };
DefinesA_Class
(links are to index).
In this example, the name of the chunk was ``Example chunk of code.''
The number on the right of the chunk name, also found on the left
margin, identifies the chunk and will be used in cross-references. We
will refer to such number as the ``chunk number''. Also, when an
interesting entity has been defined by a chunk (A_Class
in the
example) it will be stated below the chunk using a small font.
The chunk number is made the page number (identifying in which page the chunk is defined) and, if necessary, a letter to distinguish between different chunks in the same page. Using this number one is able to quickly locate any chunk in the document.
Chunks of code can include other chunks, in this case, you can use the
number enclosed with the chunk name between angles (i.e. ``''
and ``
'') to locate quickly the code for the chunk being
included. In this example, the chunk shown above will be included in
chunk 3a. We then should use the chunk numbers found in the left margin
to locate it.
<A chunk including another. >= // Some code... <Example chunk of code. > // And we also use A_Class A_Class an_object;
Finally, a chunk may be continued at a different part of the document. In this case it will be said explicitly below the chunk using a small font, as it can be seen below.
<Continuing chunk of code. >= [D->] //Mary had
<Continuing chunk of code. >+= [<-D] //a little lamb.
Note the use of the ``+'' to state that the chunk is a continuation.
Finally, we suggest reading this document as it is. Do not try to read
it following the order understood by a C++
compiler. The
references below each chunk will allow you to navigate through this
program's web.
To implement Off++ several tools have been used. They are all you need to build a system image from the on-line source for this document.
All these tools have been already used in the construction of the original Off prototype.
System services are made of a bunch of system objects exported to users. The model is actually more generic: portals are used whenever a protection domain wants to export certain services in a secure way to alien users. For each object being exported, a set of methods are made available by means of portals.
The Off++ kernel is an object oriented system. However, the boundary
between the user and the kernel is procedural: portals are used to
perform system calls (implemented in turn by means of traps), as can
be seen in figure 2.1.
The path for a system call proceeds as follows:
Only step 3 is procedural. Both user and kernel objects think that they are calling now and then to other objects. But whenever a protection domain has to be crossed, a portal is used.
In this way no remote method invocation has to be built inside the kernel so that it could be kept simple.
Thus, the only actual system calls are those needed to implement the portal delivering mechanism. Remaining system services are first provided by means of portals and then wrapped at user level by objects.
Whenever an object wants to export part of its interface to the outer
world it must create one portal per method. To automate the task, each
class with methods being exported should be labeled ENTRY
too. In
this way a compiler can generate the glue code between the portals and
the methods being exported. All the method of task are considered to
be exported to the user.
In particular, the compiler will generate automatically the implementation of this method:
<Off private methods for exported objects. >= // Creates portals to access class methods void export(void);
The code generated for export
will create one portal per method.
The portal handler will be setup so that methods are invoked
transparently: The program counter will be pointing to the method code
and the maximum expected number of message words will be setup to the
size in the stack of the method arguments. The task of extracting the
arguments is done automatically by the portal invocation mechanism and
the compiler because portal messages are always copied from (caller)
stack to (callee) stack.
Those methods exported are expected to return a non-zero value with
the error code in case the invocation fails, or zero otherwise. This
can be used by user-level wrappers to raise exceptions. That error
code will be returned by delivering a ``system call failed''
(SCERR
) event to the shuttle issuing the system call. The handler
for that event may set the error code inside the user application so
that an exception could be raised. The benefit of using events for
error notifying is that we save a register and permit ``optimistic''
system usage where most of the times the system does not need to
return anything to the user.
It should be noted that the object constructor should call export
or nothing will be actually exported.
Portal identifiers are not stored because the order in which exported system objects are created is always the same. Thus, the algorithm used to construct the portal identifiers ensures that given the node identifier, portal numbers will always be the same. This is to say that they are node-wide constants.
Although it is not part of the kernel, we will say that the compiler
mentioned above can generate user wrappers too. These wrappers are
objects with the same signature of the system object being exported
but with just those methods being labeled ENTRY
. Each wrapper
method will simply invoke to the kernel portal and raise an exception
on portal return whenever the return code is non-zero.
Note that many of these wrappers are to be built dynamically as users obtain access to system resources. They can cache the identifier of the resource being wrapped and use that identifier to to perform system calls.
To make it clear what system services are being exported to the user,
and also to avoid in-kernel authentication (e.g. unnecessary access
checks when operations are requested by the kernel itself) we will
wrap every object being exported to the user. Those wrappers will be
named with the prefix off_u
and will be described right after the objects they are
wrapping to make it clear which methods are exported to the user.
There are some basic interfaces which must be present on almost every
system object. They were initially inherited by the
Resource
class, which models system
resources, as can be seen in figure 2.2, but we will be
using aggregation instead (see figure 2.3) to avoid multiple
inheritance
.
Figure 2.2: Initial resource hierarchy (not used)
Figure 2.3: Basic resource hierarchy
To protect object access we define a generic set of access operations
(read, write, execute, delete, and protect in the current
implementation) and for each one we have a
Protection
object specifying the protection for such operation.
<Off access operations. >= // Operations which can be protected. enum off_op_t { OFF_OP_R=0x1, OFF_OP_W=0x2, OFF_OP_X=0x4, OFF_OP_D=0x8, OFF_OP_P=0x10 }; const natural_t OFF_NOPS = 5;
DefinesOFF_NOPS
,OFF_OP_D
,OFF_OP_P
,OFF_OP_R
,off_op_t
,OFF_OP_W
,OFF_OP_X
(links are to index).
Individual operations can be combined together (usually by a bit-or
operation) to specify an access mode (e.g.
OFF_OP_R|OFF_OP_X
for ``read and execute'' access mode).
Also, note that natural_t
and other basic types which we will be
using through the document have to be defined in a portable way (i.e.
ensuring that they all have the same size across different platforms).
<Off access mode. >= typedef off_op_t off_mode_t;
Definesoff_mode_t
(links are to index).
Three different objects are involved in protection:
Protection
object will specify the protection of a given
Resource
for different access modes.
Rights
object shows the access rights held by the user
accessing the resource.
AccessChecker
objects implement the policy by which
access is granted or denied for a given Protection
, access
Rights
and access mode.
\end{itemize}
The meaning of Protection
is then defined by the AccessChecker
object. Every time the system wants to know whether access is granted
or denied, it calls the resource access_check
method (which is
delegated to the AccessChecker
with the user Rights
and access
mode
). To change the protection the (also delegated) method
protect
is called instead.
<Off access checker. >= // Checks access for user operations on kernel resources // class off_AccessChecker { public: // Checks for rights for the given access mode boolean_t access_check( off_mode_t m, const off_Rights &r, const off_Protection &p ) const; // Change protection void protect(off_Protection &old, const off_Protection &new, off_mode_t m); };
Definesoff_AccessChecker
(links are to index).
Resources that need to be protected will delegate its protection
control to a system-wide AccessChecker
. Different implementations
of the off_Protection
interface will provide various protection
mechanisms including access control lists and capabilities. The
protection information will be kept in _prot
inside each object being protected.
<Off private members for protected objects. >= (U->) off_Protection _prot; // protection for this resource
<Off public methods for protected objects. >= (U->) inline boolean_t access_check( off_mode_t m, const off_Rights &r ); inline void protect(const off_Protection &p, off_mode_t m);
The second one is exported to system users.
<Other public methods of off_uResource
. >= (U->) [D->]
void protect(const off_Protection &p, off_mode_t m, const off_Rights &r);
The specific Protection
mechanism being used can be changed
without disturbing the rest of the code. Once the best
protection mechanism be known empirically, the system can be optimized
by hardwiring it and avoiding dynamic dispatching.
Initially, a Protection
is defined as a big random number per
operation. A Rights
object is also a big random number. They are
used as capabilities. The AccessChecker
considers the access to
be granted if Rights
matches the numbers found in Protection
for the access mode being specified. We believe that any other
protection model can be quickly incorporated.
Finally, it should be clear that every system object exporting a
subset of methods to the user is responsible of calling
access_check
with the appropriate access mode at the beginning of
every exported operation. Otherwise protection will not take effect.
Resources exported by the kernel will also be used by other kernel resources
(e.g. page frames are used by address translations). To avoid resource
deletion while its being used and to provide basic support for garbage
collection we employ reference
counters (RefCounter
objects). When a RefCcounter
comes
down to zero an UNUSED
exception may be raised to notify
users of unreferenced (maybe unused) resources.
This feature is actually included as a builtin service in every
Resource
, so that we could save some typing and execution time
using foo->reference()
instead of
foo->reference_counter->reference()
..
The implementation is so simple that we have folded and inlined it.
<Off public methods for reference counting objects. >= (U-> U->) // References (unreferences) an object void reference( void ) { _rc++; }; void unreference( void ) { if (!--_rc) unused(); }; natural_t get_num_refs(void) { return _rc; };
The only subtle point is that we will use the virtual method
unused
to raise any exception and then destroy the object.
<Off protected methods for reference counting objects. >= (U-> U->) // Raises UNUSED exceptions and destroyes the object virtual void unused(void) {;}
<Off private members for reference counting objects. >= (U-> U->) natural_t _rc; // # of references
To synchronize access to system objects, we will use spin locks and
bureaucracy most of the
time.
The access pattern to system objects in system call is:
make_available()
to bring any needed remote object here.
lock
them
either for reading or for writing (usual multiple readers, single
write semantics apply).
Thus, resources must include a read/write lock. The member variable
_lock
will be included in every ``lockable'' object.
<Off private members for lockable objects. >= (U->) rw_lock_t _lock; // The lock
Again, we include this as builtin functionality to save some typing and time.
<Off public methods for lockable objects. >= (U->) // Spin (un)lock on this for reading lock_state_t r_lock(void); void r_unlock(lock_state_t old_state); // Spin (un)lock on this for writing lock_state_t w_lock(void); void w_unlock(lock_state_t old_state);
The value returned by lock
routines (the lock_state
) holds the
state to be restored when the resource is released through unlock
routines, e.g. the interrupt mask (are interruptions enabled or
disabled?) and any other piece of ``global'' state changed by the lock.
In order to wait for a resource to reach a given state, or to do
long-term waiting due to resources being temporarily unavailable, we
use Bureaucrat
s. Each resource that may cause long-term waiting
(abstract resources) will provide one or more Bureaucrat
s so that
shuttles block on them until some new state is reaached.
Bureaucrat
s can be understood as condition variables but can be
implemented by other means.
The use of condition variables is not an obstacle to system distribution because we will be using lists of identifiers (and not lists of objects) to represent the set of shuttles waiting on condition variables. Thus a single list may span several nodes.
<Off bureaucrat. >= class off_Bureaucrat { public: // Puts 'who' to sleep on this bureaucrat. void wait(shtl_id_t who); // Signals this condition. Just one will be awaken. void signal(void); // Signals this condition. Everyone will be awaken. void signal_all(void); }
Definesoff_Bureaucrat
(links are to index).
%% --------------------------------------------------------------
Some objects will need to atomically obtain unique sequence numbers to create unique identifiers (for other objects contained inside). Again, this functionality is incorporated into the aggregate.
<Off private members for sequencing objects. >= (U->) off_seq_t _seq; // Next sequence nb.
<Off protected methods for sequencing objects. >= // Gets a new sequence number. Uses lock() to ensure atomicity off_seq_t get_seq(void);
Each resource in the system supports a set of well-defined operations.
Namely, those provided by building blocks that we have seen before and
a few others like freeze
, to get the state of the object and defer
any further invocations, melt
, to recreate a frozen object, and
dump
, which will return a printable representation of the resource
state.
<Off resource. >= class off_Resource { private: <Off private members for protected objects. > <Off private members for reference counting objects. > <Off private members for lockable objects. > <Other private members ofoff_Resource
. > protected: <Off protected methods for reference counting objects. > public: <Off public methods for lockable objects. > <Off public methods for reference counting objects. > <Off public methods for protected objects. > // Every resource (be it simple or compound) is able to be // dumped for inspection and frozen/melted. virtual void *freeze(void *to, size_t size)=0; virtual melt(void *from, size_t size)=0; inline boolean_t is_frozen(void); virtual char *dump(char *buf, size_t size) const; <Other public methods ofoff_Resource
. > };
Definesoff_Resource
(links are to index).
Also, we should mention that some resource members are initialized at allocation time (with the information provided by system users). That information includes the protection for the resource and an identifier used to identify the user-level entity responsible for the object after its allocation; we name such entity the object domain.
Being the \emph{Portal} the IPC mechanism provided by the kernel, the
domain identifier is indeed a portal identifier of type
off_prtl_id_t
. By doing that, we can have the kernel performing
upcalls to either user-level or in-kernel domain servers which will be
responsible for allocated system resources. Besides, the meaning of
the domain abstraction is defined by the user and not by the kernel,
as will be discussed in section 2.6.
<Other private members of off_Resource
. >= (<-U)
off_prtl_id_t r_domain; // Domain for the resource
\subsection{Resources for plain users}
These are the common entry points for every system resource.
<Off resource for users. >=
ENTRY class off_uResource {
private:
off_Resource &u_resource;
public:
void *freeze(void *to, size_t size, const off_Rights &r)=0;
void melt(void *from, size_t size, const off_Rights &r)=0;
boolean_t is_frozen(const off_Rights &r);
char *dump(char *buf, size_t size, const off_Rights &r) const;
<Other public methods of off_uResource
. >
};
Definesoff_uResource
(links are to index).
Off uses two kinds of identifiers:
off_id_t
composed by
creation_node:sequence_number:implementor_index
and used for
mobile objects (including resource containers and abstract
resources),
off_eu_id_t
composed by off_id_t:name-used-in-hw
for
the elementary hardware units being exported.
The size of each field is defined as shown.
<Off identifiers. >= [D->] typedef unsigned16_t off_node_t; // Node descriptor typedef unsigned32_t off_seq_t; // Sequence number typedef unsigned16_t off_slot_t; // Slot number typedef unsigned long off_offset_t;// Elementary unit offset number
<Off identifiers. >+= [<-D] // Relocatable top level identifier struct off_id_t { off_node_t i_node : 16; // Creation node for the object off_seq_t i_seq : 32; // Sequence number off_slot_t i_slot : 16; // Implementor's descriptor for the object }; // Hardware resource elementary unit identifier struct off_eu_id_t { off_id_t e_container; // id for its container off_offset_t e_offset; // Object id relative to the container. };
Definesoff_eu_id_t
,off_id_t
(links are to index).
Resources are arranged as recursive containers, being the node
the
outermost one.
These containers delegate allocation to an Allocator
object which
can (and will) be further wrapped
to delegate usage statistics to a
bookkeeper
. See
figure 2.4.
<Off compound resource. >=
// Compound resource. Uses an allocator to allocate resource units.
class off_CompResource : public off_Resource {
private:
// Allocator used for this container.
off_Allocator *c_pool; // Resource unit pool
<Other private members of off_CompResource
. >
public:
// Named with `relocatable' identifiers
off_id_t get_id(void) const;
// Gives a pointer to the container's allocator
off_Allocator *get_allocator(void) const;
};
Definesoff_CompResource
(links are to index).
Specific resource units maintain a reference to their container. They
also redefine the new
operator so that it could be used to
allocate resource units from a container's pool. This way we can
avoid memory fragmentation.
The reference to the container and the redefinition of new
will be
declared in subclasses of ResUnit
to avoid explicit type casts.
<Off elementary resource unit. >= // Elementary resource unit. class off_ResUnit : public off_Resource { };
Definesoff_ResUnit
(links are to index).
\subsubsection{Compound resources and resource units for plain users}
Compound resources export to users the allocation information found in
the CompResource
allocator. To do so, they export the
get_allocator
method. They also export get_id
.
<Off compound resource for users. >= ENTRY class off_uCompResource : public off_uResource { public: off_id_t get_id(const off_Rights &r) const; off_Allocator *get_allocator(const off_Rights &r) const; };
Definesoff_uCompResource
(links are to index).
We do not export to users anything besides that exported by a
Resource
in ResUnit
system objects.
<Off elementary resource unit for users. >= ENTRY class off_uResUnit : public off_uResource { };
Definesoff_uResUnit
(links are to index).
When users issue a system call, it is targeted to a particular system
object. The system call is processed in the local node only when that
object is local. In any other case we forward the system call to a
remote node by raising a FWD
exception to the user.
Assuming the target system object is local, so that the system call proceeds, it is not guaranteed that every resource needed by that system call be present in the local node.
To deal with remote resources, every resource container
(CompResource
s) implements a make_available
method so that
whenever a resource is needed it could be fetched to the local
node (if that is really required).
Besides, when the container being considered is an AbsCompResource
(a container of abstract resources), we have to take into account that
resources contained inside it (abstract resources) are able to move
around. Thus, in this case we will employ a relocation table to cache
known object locations.
In every CompResource
, make_available
proceeds as follows:
AbsCompResource
(using the slot
field of the resource identifier). If the resource is found there,
then there is nothing more to do.
MISSING
exception providing the expected location to user so
that he could fetch the remote resource and bring it to the
local container.
MISSING
exception handler completes successfully. Should the
handler fail, the system call is aborted and an ILL
(illegal
instruction) exception is raised to the Shuttle domain.
We should note that resource ``relocation'' counts mostly for portals, shuttles and DTLBs. Elementary resource units are not able to move around, so they will always be at their creation nodes.
Besides, because the system can be heterogeneous, a remote resource
brought to the local node may be meaningless to us or require some
sort of marshaling (eg. when it corresponds to a different
architecture). In this case, a XDT
exception is raised in order to
request to a trusted user translator an image of the remote object in
the native format.
Each resource
will be wrapped with a Navigator
and an
Inspector
. This way, the OS built on top of the kernel will be able
to transparently navigate through its components and later on ask
about resource attributes (e.g. bandwidth, persistence, etc.),
The class hierarchies of Navigator
s and Inspector
s mimic that
of Resource
, thus we omit it.
Every resource will include two methods, get_navigator
and
get_inspector
, to provide a pointer to its specific navigator and
inspector. Both of them will be friends of the object considered and
will be able to access its state in order to support their services.
<Other public methods of off_Resource
. >= (<-U)
// Returns the navigator for the resource.
virtual off_Navigator *get_navigator(void) const;
// Returns the inspector for the resource.
virtual off_Inspector *get_inspector(void) const;
By enquiring the navigator, references to other system resources may
be obtained and get_navigator
or get_inspector
may be used
again.
The scheme works as shown in the example of the figure 2.5:
get_navigator
, a reference to its navigator is
obtained. This navigator implements the Navigator
interface.
get_inspector
is used to obtain a reference to an
object implementing the Inspector
interface. A common protocol
can be used now to find out about the page frame persistence, size,
etc.
Of course, both get_navigator
and get_inspector
are exported
to system users.
<Other public methods of off_uResource
. >+= (<-U) [<-D]
off_Navigator *get_navigator(const off_Rights &r ) const;
off_Inspector *get_inspector(const off_Rights &r ) const;
A Navigator
provides methods to select and lookup components.
Figure 2.5: Architecture awareness support: Navigators and Inspectors.
<Off resource navigator. >= // A resource navigator ENTRY class off_Navigator { public: // Returns a reference to the resource being navigated off_Resource *get_current(void); // Returns the first element. off_Resource *get_first(void); // Advances the iteration and returns the next element. off_Resource *get_next(void); // Selects a component by name. off_Resource *operator[](char *name); // by identifiers off_Resource *operator[](const off_eu_id_t &id); off_Resource *operator[](off_id_t id); };
Definesoff_Navigator
(links are to index).
An Inspector
has two main methods to lookup attributes either by
name or by index. Attribute indexes will vary from 0
to some
arbitrary value. The first invalid attribute is guaranteed to have a
NULL
name and type.
To avoid usage of void *
, users are requested to specify the
expected type of an attribute. As it is unrealistic to force users to
know in advance which attributes can be present, a method nameof
is provided to obtain the name of a given attribute, and a method
typeof
is provided to obtain its type.
<Off resource inspector. >= // A resource inspector ENTRY class off_Inspector { public: // Returns the name of an attribute at [idx] char *nameof( natural_t idx) const; // Returns the type of an attribute off_attr_kind_t typeof(natural_t idx) const; // Look up a boolean attribute by name or by index virtual boolean_t get_bool_attr(char *name) const { return FALSE; } virtual boolean_t get_bool_attr(natural_t idx) const { return FALSE; } // Look up a natural attribute by name or by index virtual natural_t get_nat_attr(char *name) const { return 0; } virtual natural_t get_nat_attr(natural_t idx) const { return 0; } // Look up a string attribute by name or by index virtual char *get_str_attr(char *name) const { return NULL; } virtual char *get_str_attr(natural_t idx) const { return NULL; } // Look up an id attribute by name or by index virtual off_id_t get_id_attr(char *name) const { return OFF_ID_NULL; } virtual off_id_t get_id_attr(natural_t idx) const {return OFF_ID_NULL;} };
Definesoff_Inspector
(links are to index).
The default implementation is to ignore the arguments and return a
silly value, so that such function could be ignored: not every
resource has attributes of every type. Thus, only nameof
,
typeof
and one of the for previous pairs of methods must be
provided by any subclass.
The valid attribute types are
<Off attribute type ids. >= enum off_attr_kind_t { OFF_BOOL_ATTR, OFF_INT_ATTR, OFF_STR_ATTR, OFF_ID_ATTR };
Definesoff_attr_kind_t
,OFF_BOOL_ATTR
,OFF_ID_ATTR
,OFF_INT_ATTR
,OFF_STR_ATTR
(links are to index).
Finally, any resource should have at least these attributes defined
<Off attributes. >= enum { OFF_ATTR_NULL = 0, // Used to mark the end of the attr list OFF_ATTR_NAME, // Name of this resource OFF_ATTR_CLASS, // Class for this resource OFF_ATTR_DOM, // Resource domain OFF_ATTR_ID, // Resource id (or container id) OFF_ATTR_OFFSET, // Resource offest in a container or 0 OFF_ATTR_URL, // URL for the resource literate source code };
DefinesOFF_ATTR_CLASS
,OFF_ATTR_DOM
,OFF_ATTR_ID
,OFF_ATTR_NAME
,OFF_ATTR_NULL
,OFF_ATTR_OFFSET
,OFF_ATTR_URL
(links are to index).
%% --------------------------------------------------------------
Every allocator will provide the same interface, so that we could replace it at will.
<Off allocator. >= signature off_Allocator { // Returns a chunk of raw memory. // Will try to allocate n chunks starting at the i-th item. // i == 0 means allocate anywhere. void *allocate(natural_t n=1, natural_t i=0); // Deallocates the chunk pointed by p void deallocate(void *p); };
Definesoff_Allocator
(links are to index).
The signature
keyword is a GNU C++ extension which permits the use
of subtype polymorphism separately from inheritance. Some dynamic
dispatching is saved this way. However, it can be considered to be an
abstract base class or interface implemented by every allocator.
\subsection{Allocation statistics}
Statistics are maintained by bookkeeping allocators (BKAllocators
)
which are composed by wrapping the basic
Allocator
with some bookkeeping methods. These operations are
inlined.
<Off bookkeeping allocator. >= class off_BKAllocator { private: natural_t b_nfree; // # of free nodes natural_t b_nalloc; // # of allocated nodes natural_t b_maxalloc; // Maximum # of ever allocated nodes natural_t b_nallocrq; // # of alloc requests natural_t b_nfreerq; // # of free requests public: // Get some statistics natural_t get_nfree(void) const {return s_nfree;} natural_t get_nalloc(void) const {return s_nalloc;} natural_t get_maxalloc(void) const {return s_maxalloc;} natural_t get_num_allocrq(void) const {return s_nallocrq;} natural_t get_num_freerq(void) const {return s_nfreerq;} // Wrapped allocator bookkeeping methods inline void *allocate(natural_t n=1, natural_t i=0); inline void deallocate(void *p); };
Definesoff_BKAllocator
(links are to index).
\subsection{Fixed and Block allocators}
The FixedAllocator
is an Allocator
with a fixed allocation
scheme. Its implementation may vary, but in general it will consist on
a fixed array and a linked list of free nodes. This class will provide
a generic implementation which can be later optimized by specialized
allocators.
<Off fixed allocator. >= class off_FixedAllocator: public off_BKAllocator { public: // Allocates (deallocates) units from a fixed store. void *allocate(natural_t n=1, natural_t i=0); void deallocate(void *p); };
Definesoff_FixedAllocator
(links are to index).
The BlockAllocator
is an allocator which can grow dynamically.
<Off block allocator. >= class off_BlockAllocator: public off_BKAllocator { public: // Allocates (deallocates) units from a growing store void *allocate(natural_t n=1, natural_t i=0); void deallocate(void *p); // Resizes the store void grow(natural_t n); };
Definesoff_BlockAllocator
(links are to index).
Even though it is not required for the understanding of the rest of
this document, we now say a few words about the implementation of
BlockAllocator
. In Off, it is used to allocate abstract system
resources (shuttles, portals and DTLBs).
It is desirable to avoid static limits in the amount of resources an application can allocate. So, each application would allocate, for instance, as many shuttles as it wants and as many portals as it wants. To implement that, we chose to allocate abstract system resources in virtual memory.
However, adding this flexibility to the allocator could cause problems in the case when a single application allocates many resources (e.g. many portals) harming other applications' performance.
Our implementation of the BlockAllocator
helps solving this
problem by allocating elementary units (i.e. shuttles, portals and
DTLBs) so that those of a particular application can be paged out
without disturbing other applications. The kernel virtual memory may
look like the one depicted in figure 2.6. In this
situation the large number of VM pages used to store one of the
applications' (xfig) portals does not avoid the other (
gcc) to have the opportunity to freely allocate its portals.
Figure 2.6: Storage of Portals for two different applications.
As the system does not know what a ``user application'' is, it is not
possible to make the allocator use a different page for each
application. However, the BlockAllocator
can use the resource
domain identifier as a hint. So, what we actually do is to use a
different set of virtual memory pages to allocate resources for each
application domain.
Resources are not revoked by the Off++ kernel. Instead, an
UNAVAILABLE
exception is raised for the resource exhausted; so
that users could run whatever algorithm is needed to make more units
of the resource being exhausted.
The exception trigger is the memory allocator being used. System
allocators are provided with an Exhausted
object at instantiation
time for that purpose. The Exhausted
object provides a single
method notify
to let allocators trigger the proper action (by
default, raising an UNAVAILABLE
exception).
<Off Exhausted resource revocation trigger. >= signature off_Exhausted { public: // Notify that resource is exhausted void notify(void); };
Definesoff_Exhausted
(links are to index).
Hardware resources are not able to move and are always located at the same place. If they grow, they grow one container at a time (i.e. when new disks or pluggable memory banks are added to the system). Hardware resource containers may move by replacing a whole container by another. So, usually, hardware containers will use a fixed allocator.
The hardware resources exported by the kernel are arranged as
recursive containers starting from Node
. See
figure 2.7.
Figure 2.7: Hardware resource containers and elementary units.
<Off hardware resource container. >= // A hardware resource container. Units will be allocated // within it. class off_HWCompResource : public off_CompResource { public: // We may wish to cache a resource unit void make_available(off_eu_id_t u); };
Definesoff_HWCompResource
(links are to index).
Hardware resource units employ a particular naming scheme. They do
not use off_id_t
identifiers as remaining system resources do.
They use elementary unit identifiers (off_eu_id_t
) instead. These
ones are built of an off_id_t
(naming the container) and an
offset
locating the object inside the container. The offset
matches the resource physical name, i.e. that understood by the
hardware.
<Off hardware resource unit. >= // A hardware resource unit. // Uses virtualized identifiers <container id><hw id> // and a fixed allocation scheme. class off_HWResUnit : public off_ResUnit { public: // Gets the identifier for this unit. off_eu_id_t get_id(void) const; };
Definesoff_HWResUnit
(links are to index).
\subsection{Hardware resources for plain users}
Users can ask for the identifier of a given resouce unit.
<Off hardware resource unit for users.>= ENTRY class off_uHWResUnit : public off_uResUnit { public: off_eu_id_t get_id(const off_Rights &r) const; };
Definesoff_uHWResUnit
(links are to index).
Compound hardware resources do not export anything.
<Off hardware resource container for users. >= ENTRY class off_uHWCompResource : public off_uCompResource { public: };
Definesoff_uHWCompResource
(links are to index).
Shuttles, Portals, and DTLBs (see 1.2) are system
abstractions, and are able to move around. Consequently, abstract
resource containers (AbsCompResource
s) provide a method
make_avaiable
which uses a relocation table so that system code
may request to one of the system containers to make a particular
resource unit available at the local node.
They are arranged, as physical resources are, as recursive containers
being the Node
the top-level container (see
figure 2.8).
\begin{figure}[htb]
<Off system server. >= // A system resource container. class off_AbsCompResource : public off_CompResource { // System containers use a block allocator and a relocation cache. public: // We may wish to fetch a resource unit void make_available(off_id_t u); };
Definesoff_AbsCompResource
(links are to index).
<Off system resource. >= // A system resource unit. It's relocatable by nature. class off_AbsResUnit : public off_ResUnit { public: off_id_t get_id(void) const; };
Definesoff_AbsResUnit
(links are to index).
\subsection{Abstract resources for plain users}
The wrappers for abstract resources are so simple that we will not discuss them.
<Off system server for users. >= ENTRY class off_uAbsCompResource : public off_uCompResource { public: };
Definesoff_uAbsCompResource
(links are to index).
<Off system resource for users. >= ENTRY class off_uAbsResUnit : public off_uResUnit { public: off_id_t get_id(void) const; };
Definesoff_uAbsResUnit
(links are to index).
The kernel does not implements either ``processes'' nor any other ``resource-container'' abstractions. Users allocate system resources and release them back to the kernel when they are no longer needed. None of the system resources is responsible for containing the resources of a given application.
Besides, as we will see, the system resource that is the most similar to the traditional process abstraction (the ``shuttle'') does not contain the resources accessed during its execution. Thus, how are bookkeeping tasks accomplished? And, how are resources released upon application crashes?
Certainly, a user-level (i.e. a non-kernel) process or ``application''
abstraction is needed. To help on this task, the kernel delegates
resource bookkeeping to its users.
Users should implement resource domains containing a set of resources used by a user application. This concept is introduced in the system in order to help solving the three following problems:
Whenever a user issues a resource allocation request, it provides a
target domain identifier tdi
for that new resource. Then,
the kernel entitles that resource to the domain identified by tdi
(i.e. sets the r_domain
of the Resource
being allocated to the
supplied domain identifier -- as defined in
2.3).
When the resource gets finally unused, the kernel notifies that using its domain identifier as a portal name, so that users could update their allocation tables. Thus, domain identifiers must be portal identifiers.
As we said, the meaning of a domain is defined by the user, not by the kernel (although the DM can be loaded into the kernel for efficiency).
The 2K operating system will utilize this basic mechanism for implementing the concept of nested domains. From a single abstraction, it will be possible to implement replacements for traditional OS abstractions like group of users, user, process group, process, and thread. This will be possible because nothing forbids domains to persist, so that a permanent user can be seen as a persistent domain.
2K will also be able to support new abstractions like, for example, temporary users. They would be implemented as non-persistent domains and could be created by regular users inheriting a subset of the creator permissions. This would add flexibility compared to existing systems in which usually only a ``super-user'' is allowed to create new users.
A resource could be also logically moved from one domain to another by
changing its r_domain
. There is nothing in the system architecture
forbidding such feature.
A Node
object is responsible for bringing the system into operation and
also for shutting it down (i.e. to either halt
, reboot
, or suspend
it). Node
is also a singleton used as a placeholder for
node-wide system properties (e.g. the portal for the authentication server
used to establish trust relationships among distributed kernels (
auth
),
and the portal for the external data translator used to translate foreign
kernel objects to the native architecture (xdt
) are placed here.)
Being the outer-most starting point in the container hierarchy, its
id
(obtained from the method get_id
) is all the user needs to
know to start looking for available system resources.
<Off node. >= // An Off Node. // Can be also handled as a resource: freeze, melt, etc. class off_Node : public off_Resource { private: <Off private members for sequencing objects. > <Otheroff_Node
private members. > protected: <Off protected members for sequencing objects. > <Otheroff_Node
protected methods. > public: // Returns the node identifier off_id_t get_id(void) const; // Returns or sets the authorization server and // the external data translator portals. off_prtl_t get_auth(void) const; off_prtl_t get_xdt(void) const; void set_auth(off_prtl_t p); void set_xdt(off_prtl_t p); <Otheroff_Node
public methods. > // Halts, reboots, or suspends this node void halt( char *msg="System On." ); void reboot( void ); void suspend( void ); };
Definesoff_Node
(links are to index).
The Node
can be also considered to be a
façade (see [11])
for node-wide system operations.
To support efficient browsing of node components, the Node
has a
set of methods that provide access to its components. Most of them will be
simply inlined.
For example, for memory banks we have:
<Other off_Node
protected methods. >= (<-U) [D->]
// Returns the number of specific containers found at this node
inline natural_t get_num_mbanks(void) const;
// Returns a pointer to a local memory bank.
inline off_MBank *get_mbank( natural_t id = 0 ) const;
The same holds to gain access to remaining resource containers.
<Other off_Node
protected methods. >+= (<-U) [<-D]
// Get a pointer to a hardware resource container
inline off_IOBank *get_iobank( void ) const;
inline off_DMA *get_dma( void ) const;
inline off_ProcTbl *get_proctbl( void ) const;
inline off_ShtlSrv *get_shtlsrv( void ) const;
inline off_PrtlSrv *get_prtlsrv( void ) const;
inline off_DMM *get_dmm( void ) const;
Besides, a generic navigator specialized for the Node
will support
the common interface for navigation. That is primarily for system
users and not for the kernel itself.
There are some global operations that do not fit well anywhere else.
The following one, for example, redirects system output within the kernel to the serial line. Output can be redirected at any time.
<Other off_Node
public methods. >= (<-U)
// Use the serial console?
void use_serial_console( boolean_t doit = TRUE );
\section{Nodes for plain users}
The wrapper for node services export almost everything as can be seen.
<Off node for users. >= ENTRY class off_uNode : public off_uResource{ public: off_prtl_t get_auth(const off_Rights &r) const; off_prtl_t get_xdt(const off_Rights &r) const; void set_auth(off_prtl_t p, const off_Rights &r); void set_xdt(off_prtl_t p, const off_Rights &r); void halt( char *msg="System On." ); void reboot( const off_Rights &r ); void suspend( const off_Rights &r ); off_IOBank *get_iobank( const off_Rights &r ) const; off_DMA *get_dma( const off_Rights &r ) const; off_ProcTbl *get_proctbl( const off_Rights &r ) const; off_ShtlSrv *get_shtlsrv( const off_Rights &r ) const; off_PrtlSrv *get_prtlsrv( const off_Rights &r ) const; off_DMM *get_dmm( const off_Rights &r ) const; // Use the serial console? void use_serial_console( const off_Rights &r, boolean_t doit = TRUE ); };
Definesoff_uNode
(links are to index).
The system entry point, which is called by the OSKit after basic
hardware initialization is named main
. Its arguments proceed from
the secondary boot loader.
<Off main entry point. >= // main system entry point. int main(int argc, char *argv[]);
Definesmain
(links are to index).
Physical memory is distributed among different memory
containers named mbank
s. Each mbank
supports allocation and
deallocation of physical memory inside a single memory bank. To do
so, it provides the alloc
and free
methods. Also, it provides
means to locate a page frame either by address or by page frame number
(PFN).
<Off memory bank. >= // A bank of memory class off_MBank : public off_HWCompResource { public: // Allocates n contiguous page frames. off_PFrame *alloc(off_Protection *prot, natural_t n=1, off_pg_id_t at=OFF_EU_ID_NULL); // Deallocates the n contiguous page frames starting at pf. void free(off_PFrame *pf, natural_t n=1 ); //Gets a page frame from its physical address off_PFrame *operator[](off_pg_id_t pfn) const; //Gets the size of pages inside the bank vm_size_t get_pgsize(void); };
Definesoff_MBank
(links are to index).
<Off page frame identifier. >= [D->] typedef off_eu_id_t off_pg_id_t;
Definesoff_pg_id_t;
(links are to index).
PFrame
maintains the information needed about each page
frame installed in the system. Their methods are just a means to access
and update such state.
<Off page frame. >= class off_PFrame : public off_HWResUnit { public: // Gets the page frame bits. pg_bits_t get_bits(void); // Sets the page frame bits and returns the old ones. pg_bits_t set_bits(pg_bits_t b); };
Definesoff_PFrame
(links are to index).
where pg_bits_t
correspond to the bits maintained by the address
translation machinery.
<Off page frame bits data type. >= typedef pt_entry_t pg_bits_t;
Definespg_bits_t;
(links are to index).
\subsection{Memory banks and page frames for plain users}
These are the wrappers for memory banks and page frames:
<Off memory bank for users. >= ENTRY class off_uMBank : public off_uHWCompResource { public: off_uPFrame *alloc(off_Protection *prot, natural_t n=1, off_pg_id_t at=OFF_EU_ID_NULL); void free(off_uPFrame *pf, const off_Rights &r, natural_t n=1 ); off_uPFrame *fetch(off_pg_id_t pfn,const off_Rights &r) const; vm_size_t get_pgsize(const off_Rights &r); };
Definesoff_uMBank
(links are to index).
<Off page frame for users. >= ENTRY class off_uPFrame : off_uHWResUnit { public: pg_bits_t get_bits(const off_Rights &r); pg_bits_t set_bits(pg_bits_t b, const off_Rights &r); };
Definesoff_uPFrame
(links are to index).
<Off page frame identifier. >+= [<-D] typedef off_eu_id_t off_pg_id_t;
Definesoff_pg_id_t
(links are to index).
%% --------------------------------------------------------------
IO bank objects are merely used to allocate ports, as I/O should be able to operate (when permitted) from user level.
<Off IO bank. >= //An IO bank class off_IOBank : public off_HWCompResource { public: // Allocates n contiguous IO ports. off_IOPort *alloc(off_Protection *prot,natural_t n=1, off_io_id_t at=OFF_EU_ID_NULL); // Deallocates the n contiguous IO ports starting at iop. void free( off_IOPort *iop, natural_t n=1 ); //Gets an IO port from its port number. off_IOPort *operator[](off_io_id_t iop) const; };
Definesoff_IOBank
(links are to index).
<Off IO identifier. >= typedef off_eu_id_t off_io_id_t;
Definesoff_io_id_t
(links are to index).
<Off IO port. >= // An IO port class off_IOPort : public off_HWResUnit { public: // Input/Output 8, 16, 32 or 64 bit words through the port. unsigned8_t in8(void); void out8(unsigned8_t o); unsigned16_t in16(void); void out16(unsigned16_t o); unsigned32_t in32(void); void out32(unsigned32_t o); unsigned64_t in64(void); void out64(unsigned64_t o); };
Definesoff_IOPort
(links are to index).
\subsection{Input/Output for plain users}
Users may access IO ports using these wrappers.
<Off IO bank for users. >= ENTRY class off_uIOBank : public off_uHWCompResource { public: off_uIOPort *alloc(off_Protection *prot,natural_t n=1, off_io_id_t at=OFF_EU_ID_NULL); void free( off_uIOPort *iop, const off_Rights &r, natural_t n=1 ); off_uIOPort *fetch(off_io_id_t iop,const off_Rights &r) const; };
Definesoff_uIOBank
(links are to index).
<Off IO port for users. >= ENTRY class off_uIOPort: public off_uHWResUnit { public: unsigned8_t in8(const off_Rights &r,); void out8(unsigned8_t o,const off_Rights &r); unsigned16_t in16(const off_Rights &r); void out16(unsigned16_t o,const off_Rights &r); unsigned32_t in32(const off_Rights &r); void out32(unsigned32_t o,const off_Rights &r); unsigned64_t in64(const off_Rights &r); void out64(unsigned64_t o,const off_Rights &r); };
Definesoff_uIOPort
(links are to index).
%% --------------------------------------------------------------
Both traps and interrupts are allocated using per-processor event
tables. Each processor will have an associated a trap table and an
associated interrupt table. Traps may be caused by the hardware or by
the kernel itself. Software interrupts are also available.
<Off event table. >= // An event table class off_EventTbl : public off_HWCompResource { protected: // Allocates n contiguous events. off_Event *alloc( off_Protection *prot, natural_t n=1, off_ev_id_t at=OFF_EU_ID_NULL ); // Deallocates the n contiguous events starting at t. void free( off_Event *t, natural_t n=1 ); //Gets an event from its number. off_Event *fetch(off_ev_id_t n) const; };
Definesoff_EventTbl
(links are to index).
<Off event identifier. >= typedef off_eu_id_t off_ev_id_t;
Definesoff_ev_id_t
(links are to index).
<Off trap table. >= // Definition of traps for the current processor class off_TrapTbl: public off_EventTbl { public: // Allocates n contiguous Traps. off_Trap *alloc( off_Protection *prot, natural_t n=1, off_ev_id_t at=OFF_EU_ID_NULL ); // Deallocates the n contiguous traps starting at t. void free( off_Trap *t, natural_t n=1 ); //Gets a trap from its number. off_Trap *operator [](off_ev_id_t n) const; };
Definesoff_TrapTbl
(links are to index).
<Off interrupt table. >= // Definition of interrupts for the current processor. class off_IntTbl: public off_EventTbl { public: // Allocates n contiguous interrupts. off_Irq *alloc( off_Protection *prot, natural_t n=1, off_ev_id_t at=OFF_EU_ID_NULL ); // Deallocates the n contiguous interrupts starting at i. void free( off_Irq *i, natural_t n=1 ); //Gets an Irq from its number. off_Irq *operator [](off_ev_id_t n) const; };
Definesoff_IntTbl
(links are to index).
Event tables are a set of Event
s.
Each event is identified by an unique id and a reason.
<Off event. >= // An event, it could be a trap, an interrupt, ... class off_Event: public off_HWResUnit { public: // Sets h as the handler for this event void set_handler( off_EventHandler h); // Gets the handler. off_EventHandler get_handler(void); };
Definesoff_Event
(links are to index).
Events are handled by EventHandler
s which can rely either on
portals or, when delivered inside the kernel, on function calls.
<Off event handler. >= class off_EventHandler { public: // Calls the handler virtual void operator()( off_event_t ev, off_ev_reason_t reason, ... )=0; }; class off_FnEventHandler : public off_EventHandler { private: void (*handler)(off_event_t ev, off_ev_reason_t reason, ...); public: // Calls the handler virtual void operator()( off_event_t ev, off_ev_reason_t reason, ... ); }; class off_PrtlEventHandler : public off_EventHandler { private: // Its handler is // void (*handler)(off_event_t ev, off_ev_reason_t reason, ...); off_prtl_id_t handler; public: // Calls the handler virtual void operator()( off_event_t ev, off_ev_reason_t reason, ... ); };
Definesoff_EventHandler
,off_FnEventHandler
,off_PrtlEventHandler
(links are to index).
<Off events and reasons data types. >= typedef natural_t off_event_t; typedef natural_t off_reason_t;
Definesoff_event_t
,off_reason_t
(links are to index).
In turn, events may be either Trap
s or Irq
s (interrupts).
<Off Trap. >= class off_Trap: public off_Event { private: off_mdepTrap &t_mdep; };
Definesoff_Trap
(links are to index).
Although traps cannot be raised artificially, interrupts can (e.g. to notify users or interrupts ``caused by the kernel'', and not by the hardware).
<Off Irq. >=
class off_Irq: public off_Event {
private:
off_mdepIrq &i_mdep;
public:
void raise(void);
<Other public methods of off_Irq
. >
};
Definesoff_Irq
(links are to index).
Whenever an interrupt happens, we must choose either to evict the current shuttle to dispatch the interrupt to its handler or to defer interrupt delivering in favor of the current shuttle. To make a choice, both shuttles and interrupts have priority-levels. See section 5.1.3 for a discussion of interrupt priorities.
<Other public methods of off_Irq
. >= (<-U)
//Sets/Gets the priority level for this interrupt.
void set_prty(off_pl_t prty);
off_pl_t get_prty(void);
<Off interrupt priority level data type. >= typedef natural_t off_pl_t;
Definesoff_pl_t
(links are to index).
We have been seeing a few machine dependent objects like
off_mdepEvState
, off_mdepTrap
, and off_mdepIrq
. Those are
defined for a particular architecture and their meaning should be
clear by the context. Machine dependent objects are always prefixed by
off_mdep
and will be described along with the system
implementation.
Finally, there are some ``traps'' defined by Off++ and not by the hardware, some of them have appeared before and some will appear later on.
<Off Virtual traps. >= // Virtual (Off++ raised) exceptions enum off_ex_t { OFF_EX_UNUSED, // resource no longer referenced OFF_EX_MISSING, // resource missing (remote?) OFF_EX_UNAVAILABLE, // resource exhaust OFF_EX_RELOC, // resource has moved OFF_EX_XDT, // translation to native format needed OFF_EX_FWD, // request should be processed remotely OFF_EX_SCERR // system call failed };
DefinesOFF_EX_FWD
,OFF_EX_MISSING
,OFF_EX_RELOC
,OFF_EX_SCERR
,off_ex_t
,OFF_EX_UNAVAILABLE
,OFF_EX_UNUSED
,OFF_EX_XDT
(links are to index).
On the one hand, users can interact with trap and interrupt tables
through methods provided by Shuttles and Processors. Thus,
Shuttle
(see section 5.1) and and Processor
(see
section 4.5) behave as a facade because they provide simple
entry points to handle both trap and interrupt tables.
On the other hand, individual events (trap and interrupt entries) are exported to users by means of a wrapper class.
<Off event for users. >= ENTRY class off_uEvent : public off_uHWResUnit { public: void set_handler( off_prtl_id_t h, const off_Rights &r); off_prtl_id_t get_handler(const off_Rights &r) const; };
Definesoff_uEvent
(links are to index).
There is nothing specific for user traps as of this day.
<Off Trap for users. >= ENTRY class off_uTrap : public off_uEvent { public: };
Definesoff_uTrap
(links are to index).
On the other hand, interrupts export a few methods.
<Off Irq for users. >= ENTRY class off_uIrq: public off_uEvent { public: void raise(const off_Rights &r); void set_prty(off_pl_t prty, const off_Rights &r); off_pl_t get_prty(const off_Rights &r) const; };
Definesoff_uIrq
(links are to index).
DMA
objects are used to allocate DMALine
s. Note that DMA lines
are kept in the kernel although they could have been considered to be
a device (and kept in userland). The reason is efficiency, because we
do not want protection domain crossings just to program a DMA line.
Also, we believe that no flexibility is lost by placing this small
piece of code into the kernel.
<Off DMA table. >= // Allocation of DMA lines class off_DMA: public off_HWCompResource { public: // Allocates n DMA lines. off_DMALine *alloc(off_Protection *prot,natural_t n=1, off_dma_id_t at=OFF_EU_ID_NULL ); // Deallocates the n contiguous DMA lines starting at d. void free( off_DMALine *d, natural_t n=1 ); //Gets a DMA line from its number. off_DMALine *operator [](natural_t n) const; };
Definesoff_DMA
(links are to index).
<Off DMA line identifier. >= typedef off_eu_id_t off_dma_id_t;
Definesoff_dma_id_t
(links are to index).
DMA lines are very not well modeled, instead we provided a model of DMA pretty close to that of Intel based machines. Future Off portings to different architectures should provide feedback on how should DMA lines be modeled.
<Off DMA line. >= typedef off_mode_t dma_mode_t; class off_DMAline: public off_HWResUnit { public: // Enables/disables this dma line. void enable(void); void disable(void); // Programs a DMA operation. void program(off_pg_id_t addr, vm_offset_t length, dma_mode_t mode); // Gets # of copied or pending bytes. vm_offset_t get_copied(void); vm_offset_t get_pending(void); };
Definesoff_DMALine
(links are to index).
<Off DMA line mode data type. >= typedef natural_t dma_mode_t;
Definesdma_mode_t
(links are to index).
\subsection{DMA lines for plain users}
DMA related wrappers are straightforward.
<Off DMA table for users. >= ENTRY class off_uDMA : public off_uHWCompResource { public: off_uDMALine *alloc(off_Protection *prot, natural_t n=1, off_dma_id_t at=OFF_EU_ID_NULL); void free(off_uDMALine *pf, const off_Rights &r, natural_t n=1 ); off_uDMALine *fetch(off_dma_id_t d,const off_Rights &r) const; };
Definesoff_uDMA
(links are to index).
<Off DMA line for users. >= ENTRY class off_uDMALine : public off_uHWResUnit { public: // Enables/disables this dma line. void enable(const off_Rights &r); void disable(const off_Rights &r); // Programs a DMA operation. void program(off_pg_id_t addr, vm_offset_t length, dma_mode_t mode, const off_Rights &r); // Gets # of copied or pending bytes. vm_offset_t get_copied(void,const off_Rights &r); vm_offset_t get_pending(void,const off_Rights &r); };
Definesoff_uDMALine
(links are to index).
node
singleton will be the owner of
every processor in the system and will delegate ownership to user
level (or in-kernel) schedulers.
Processor pools provide Processor
s. Usually a processor will be
allocated to a user-level (or dynamically loaded in-kernel) scheduler which
will in turn implement a processor allocation policy.
<Off processor table. >= class off_ProcTbl: public off_HWCompResource { public: // Allocates n processors. off_Processor *alloc(off_Protection *prot,natural_t n=1, off_proc_id_t at=OFF_EU_ID_NULL); // Deallocates the n contiguous processors starting at p. void free( off_Processor *p, natural_t n=1 ); //Gets a processor from its number. off_Processor *operator [](off_proc_id_t n) const; };
Definesoff_ProcTbl
(links are to index).
<Off processor identifier. >= typedef off_eu_id_t off_proc_id_t;
Definesoff_proc_id_t
(links are to index).
To help schedulers, processors provide some methods which we now discuss.
First, a new shuttle can be explicitly installed in the processor
using the switch_to
method. This is to avoid using run queues when
they are not needed. Thus, multiprocessing is an optional feature.
Dedicated single-threaded nodes can avoid using run queues at all (no
processor multiplexing code will run).
When they are, a run queue is an ordered collection of shuttle identifiers. Each identifier will be accompanied by the number of clock ticks it is expected to run.
The queue will be run in a round robin fashion until the occurrence of
an event of interest to the scheduler. To let the processor know when
to stop the run queue and notify to the scheduler notify
should be
used. It specifies which events are to cause the stop and also, who
should be notified (either by portal or by direct function call when
inside the kernel).
In any case, get_current
will return a pointer to shuttle being
run (or to the last shuttle run, when idle).
<Off processor. >= class off_Processor : public off_HWResUnit { private: <Other private members ofoff_Processor
. > public: // Installs a new shuttle void switch_to(off_Shtl &new); // Executes a run queue void run(off_shtl_id_t s[], off_offset_t ticks[], natural_t n); // Arranges for the processor to notify the owner on this events void notify( off_sevent_t events, off_EventHandler sched); // Returns the current shuttle off_shtl_id_t get_current(void); <Other public methods ofoff_Processor
. > };
Definesoff_Processor
(links are to index).
After the processor has invoked the scheduler to notify the event, the scheduler may either install a new run queue or just adjust the members of the current one. By now, the only events defined are these ones (NB. these are not hardware events, but scheduling events).
<Off scheduling events. >= // Scheduling related events. enum off_sevent_t { OFF_SEV_TRAP=0x01,// Trap occurred. OFF_SEV_IRQ =0x02, // Interrupt occurred. OFF_SEV_YLD =0x04, // Current shuttle blocked. OFF_SEV_BLK =0x08, // Other than current shuttle blocked. OFF_SEV_AWK =0x10, // A shuttle got awakened. OFF_SEV_END =0x20 // Run queue exhausted. };
Definesoff_sevent_t
(links are to index).
Although it is an implementation issue, we will say that the run queue should be kept in the memory of the scheduler. The page the queue is in should be locked so that the processor could maintain a pointer to the run queue. In this way a user level scheduler may adjust the run queue just by writing its own memory without further system calls.
Finally, as we said when we discussed traps and interrupts, each
Processor
contains a couple of event tables, one for traps and
another one for interrupts.
<Other private members of off_Processor
. >= (<-U)
off_TrapTbl p_traps; // Processor trap table.
off_IntTbl p_irqs; // Processor interrupt table.
The trap and interrupt table for a given processor can be obtained
with a public method of the Processor
class:
<Other public methods of off_Processor
. >= (<-U)
//Returns a reference to the trap/interrupt event table.
off_TrapTbl *&get_traps_ref(void);
off_IntTbl *&get_irqs_ref(void);
\subsection{Processors and processor pools for plain users}
The processor pool wrapper is defined as follows
<Off processor table for users. >= class off_uProcTbl : public off_uHWCompResource { public: off_uProcessor *alloc(off_Protection *prot, natural_t n=1, off_proc_id_t at=OFF_EU_ID_NULL); void free(off_uProcessor *pf, const off_Rights &r, natural_t n=1 ); off_uProcessor *fetch(off_proc_id_t d,const off_Rights &r) const; };
Definesoff_uProcTbl
(links are to index).
Processors for users also include a few methods coming from trap and interrupt tables.
<Off processor for users. >= ENTRY class off_uProcessor : public off_uHWResUnit { // Installs a new shuttle void switch_to(off_shtl_id_t &new, const off_Rights &proc_r, const off_Rights &shtl_r); // Executes a run queue void run(off_shtl_id_t s[], off_offset_t ticks[], natural_t n, const off_Rights &proc_r, const off_Rights &shtl_r[]); // Arranges for the processor to notify the owner on this events void notify( off_sevent_t events, off_prtl_id_t sched, const off_Rights &proc_t); // Returns the current shuttle off_shtl_id_t get_current(const off_Rights &proc_r); off_uTrap *alloc_trap(off_Protection *prot, natural_t n=1, off_ev_id_t at=OFF_EU_ID_NULL); void free_trap(off_uTrap *t, const off_Rights &r, natural_t n=1 ); off_uTrap *fetch_trap(off_ev_id_t t,const off_Rights &r) const; off_uIrq *alloc_irq(off_Protection *prot, natural_t n=1, off_ev_id_t at=OFF_EU_ID_NULL); void free_irq(off_uIrq *t, const off_Rights &r, natural_t n=1 ); off_uIrq *fetch_irq(off_ev_id_t t,const off_Rights &r) const; };
Definesoff_uProcessor
(links are to index).
The overall picture is depicted in figure 5.1. The relationships there depicted will become clear as this chapter proceeds.
Figure 5.1: Relationship among Shuttles, Portals, DTLBs and other resources.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Shuttles provide flows of control which can be extended to support
user-level or in-kernel process abstractions. The ShtlSrv
(there
is one per node) maintains a pool of shuttles. Thus, the ShtlSrv
main task is to allocate shuttles.
<Off shuttle server. >=
// A shuttle server
class off_ShtlSrv: public off_AbsCompResource {
public:
// Allocates a shuttle.
off_Shtl *alloc(const off_Protection &prot,
natural_t n=1, off_shtl_id_t at=OFF_SHTL_NULL);
// Deallocates a shuttle.
void free(off_shtl_id_t *p, natural_t n=1);
// Locates a shuttle by its number.
off_Shtl *operator [](off_shtl_id_t id);
<Other public methods of off_ShtlSrv
. >
};
Definesoff_ShtlSrv
(links are to index).
A shuttle has a processor context and a set of properties (as described below) and it is the only schedulable entity provided by the kernel. Shuttles run when installed in the run queue of a given processor (as we saw in section 4.5).
As we will see in the next chapter, shuttles move from one protection
domain to another, including the kernel protection domain, by means of
portals. Besides, the use of upcalls from kernel to user space means
that the set of kernel and user activations (see
figure 5.2) will be mixed. Thus, a user shuttle may
call the kernel and the kernel my upcall back to the user during the
system call; the upcall routine may in turn issue new system calls,
and so on.
Figure 5.2: Shuttle activations: system calls and up-calls.
We may be interested in three different processor contexts when referring to a shuttle:
This is the one used when we are interested in the ``current'' state of the shuttle, no matter what its privilege level is.
This is the one used when we are interested either in the user state at the last system call or in the state to be returned to the user.
This is the one used when we are interested in the state of a shuttle while proceeding inside the kernel.
<Off off_Shtl
register accessors. >= (U->)
// Get a pointer to current, last user, or last kernel
// registers in the stack.
off_mdepPRegs *get_regref(void);
off_mdepPRegs *get_uregref(void);
off_mdepPRegs *get_kregref(void);
A reference to the current processor context can be obtained
with get_regref
. Last user and kernel registers are accessible by means of
get_uregref
and get_kregref
respectively. As these methods
return a reference to the register storage, no ``set_
'' methods
are provided.
Shuttles may also wait on abstract system resources due to some
reason. When a shuttle goes to sleep it should call block
. Later
on, someone else may call ready
to unblock it. These methods may
cause scheduling events (SEV_BLK
, SEV_YLD
, and SEV_AWK
-- see section 4.5) to be delivered to the processor
scheduler so that it could reschedule if desired.
Another effect of calling block
is that (even if no event is ever
raised) the shuttle will be ignored by its processor until a subsequent
call to ready
is performed.
The Processor
where the shuttle is running checks
the shuttle's blocked
flag and ignores the shuttle if it is set.
The blocked
flag means ``unable to run'' and can be used not only to
block a shuttle, but also during shuttle initialization, during shuttle
dismantling, etc. It should not be confused with the more abstract
``blocked'' state used in traditional process abstractions.
<Off shuttle. >= // A shuttle. class off_Shtl : public off_AbsResUnit { private: <Otheroff_Shtl
private members. > public: <Offoff_Shtl
register accessors. > <Offoff_Shtl
property accessors. > // Blocks due to some reason related to the given resource. // [May notify the scheduler] void block(off_id_t culprit, off_object_t culprit_type); // Awakes this shuttle. // [May notify the scheduler] void ready(void); // Is this shuttle blocked? inline boolean_t is_blocked(void) const; <Other public methods ofoff_Shtl
. > };
Definesoff_Shtl
(links are to index).
Shuttles must not run at the same time on more than one processor. To ensure it, running shuttles are flagged and such a flag is checked by every processor before switching to a new shuttle. Two new methods do the job:
<Other public methods of off_Shtl
. >= (<-U)
// Checks/Sets if this shuttle is running.
inline boolean_t is_running(void);
inline void run(void);
inline void stop(void);
\subsection{Shuttles for plain users. }
<Off shuttle server for users. >=
ENTRY class off_uShtlSrv: public off_uAbsCompResource {
public:
// Allocates a shuttle.
off_uShtl *alloc(const off_Protection &prot,
natural_t n=1, off_shtl_id_t at=OFF_SHTL_NULL);
// Deallocates a shuttle.
void free(off_uShtl *p, natural_t n=1, const off_Rights &r);
// Locates a shuttle by its number.
off_uShtl *fetch(off_shtl_id_t id, const off_Rights &r);
<Other public methods of off_uShtlSrv
. >
};
Definesoff_uShtlSrv
(links are to index).
<Off shuttle for users. >= ENTRY class off_uShtl : public off_uAbsResUnit { public: off_mdepPRegs get_reg(const off_Rights &r); off_mdepPRegs get_ureg(const off_Rights &r); off_mdepPRegs get_kreg(const off_Rights &r); void set_reg( off_mdepPRegs regs, const off_Rights &r); void set_ureg(off_mdepPRegs regs, const off_Rights &r); void set_kreg(off_mdepPRegs regs, const off_Rights &r); <Offoff_uShtl
property accessors. > // Blocks due to some reason related to the given resource. // [May notify the scheduler] void block(off_prtl_id_t culprit, const off_Rights &r); // Awakes this shuttle // [May notify the scheduler] void ready(const off_Rights &r); // Is this shuttle blocked? boolean_t is_blocked(const off_Rights &r) const; <Other public methods ofoff_uShtl
. > };
Definesoff_uShtl
(links are to index).
\subsection{Shuttle properties}
Properties are used to specify which resources (e.g. DTLBs, IO maps, etc.) should be readily available for the shuttle to run. Specific resource instances are identified by a property value.
Properties are named by values of type off_prop_t
and their values
by system identifiers of type off_id_t
.
<Off shuttle property identifer. >= // Properties typedef natural_t off_prop_t;
Definesoff_prop_t
(links are to index).
The context switch code is guaranteed to call a property switch
function pswitch
for every property changing its value (i.e. for
every resource whose instance must be changed) in the
context switch.
Two additional functions, pset
and pclr
, are provided for
those cases when the hardware is switching the context automatically
but some work should be done to install (or deinstall) a property
value in a given shuttle.
Any resource provider implementing this interface can be
considered to be a property provider.
<Off shuttle property server. >= // The interface of a property server. signature off_ShtlPropSrv { // Switches property values. // Returns either 0 or an error code. int pswitch(off_id_t from, off_id_t to, off_Shtl &s); // Set or clear the property at a given shuttle. int pset(off_id_t pval, off_Shtl &s, const off_Rights &r); int pclr(off_id_t pval, off_Shtl &s); // Should we call to pswitch? boolean_t needs_switch(void); };
Definesoff_ShtlPropSrv
(links are to index).
needs_switch
method is provided to save some calls to
pswitch
. Those properties which don't need to implement
pswitch
may provide a trivial (i.e. empty) implementation and
arrange for needs_switch
to return true. When a new property is
being defined, the shuttle server accepting the definition will use
needs_switch
to determine whether it should arrange for
pswitch
to be called or not.
To define the property being implemented as a shuttle property, a
property implementor must contact ask the shuttle server to define it.
The ShtlSrv
must include now this method to support property
definitions. Indeed, we include two new methods. One to define
properties implemented inside the kernel (which may use a simple
procedure call) and another one to define properties which might be
implemented outside the
kernel (which must use portals).
<Other public methods of off_ShtlSrv
. >= (<-U)
// Defines a (new) shuttle property.
void define_kprop(off_ShtlPropSrv &implementor,
off_prop_t id, off_Protection &p);
void define_uprop(off_prtl_id_t implementor,
off_prop_t id, off_Protection &p);
void undefine_prop(off_prop_t id, const off_Rights &r);
If foreign shuttle servers ever find a shuttle with an unknown property being used, they will ask to other shuttle servers found in the net for the property definition.
Property values are stored in property sets, of type ShtlPSet
. A
ShtlPSet
is an ordered (by property id) collection of property
values so that context switch could be done quickly. To save some
more time in context switches, ShtlPSet
s may be shared. Thus they
must be reference counted.
<Off shuttle property set. >= // A set of property values class off_ShtlPSet { private: <Off private members for reference counting objects. > protected: <Off protected methods for reference counting objects. > public: <Off public methods for reference counting objects. > // Adds a new property to this set. void add(off_prop_t p, off_id_t val); // Deletes a new property to this set. void del(off_prop_t p); // Gets the value of the given property or OFF_SHTL_NULL. off_id_t operator[](off_prop_t p) const; // Gets the number of the last property in the set. off_prop_t get_maxprop(void); };
Definesoff_ShtlPSet
(links are to index).
Each shuttle has its own (possibly shared) ShtlPSet
which can be
accessed by the accessor get_psetref
.
<Off off_Shtl
property accessors. >= (<-U) [D->]
inline off_ShtlPSet *&get_psetref(void);
Some time can be saved in context switches avoiding the comparison of
property arrays. Thus, we do not include the ShtlPSet
in the
shuttle but, instead, a reference to it. So, property sets can be shared
and a single pointer comparison can check if we have the same property arrays.
<Other off_Shtl
private members. >= (<-U)
off_ShtlPSet *s_pset; // Shuttle properties.
To provide more convenient entry points for users, some methods are included to get and set property values.
<Off off_Shtl
property accessors. >+= (<-U) [<-D->]
// Gets/Sets the value for a given property.
off_id_t get_prop(off_prop_t p) const;
void set_prop(off_prop_t p, off_id_t val);
Another one will arrange for every property to be shared between two shuttles.
<Off off_Shtl
property accessors. >+= (<-U) [<-D]
// Resets the properties so that they are share with other shuttle.
void dup_props(const off_Shtl &s);
Not every property needs to call its switch function on context switch. Those properties which do not need switch functions can be placed last in the array and the comparison stop as soon as the first property without a switch function is reached.
Finally, although we have not mentioned it, properties can be used to support user-level properties by arranging a user level function to be called at the beginning of every quantum. That function can be used to establish user-level context pieces if needed (e.g. to set a pointer to per-thread data or to establish a temporary memory map so that every thread could have a private (secure) store). When such a property is being used, its property value might identify the (user-space) argument to the function setting the user-level context.
\subsubsection{Allocation of ShtlPSet
s}
Property sets are stored into a ShtlPSetSrv
used internally by the
shuttle server. It is considered to be a composite resource but it is
neither a hardware resource container nor an abstract resource
container.
<Off shuttle property sets server. >= class off_ShtlPSetSrv : public off_CompResource { public: // Allocates a PSet off_ShtlPSet *alloc(void); // Deallocates a PSet. void free(off_ShtlPSet *p); };
Definesoff_ShtlPSetSrv
(links are to index).
\subsubsection{Shuttle properties for plain users}
For system users, properties are handled by shuttles. Thus there are
few more methods in uShtl
.
<Other public methods of off_uShtl
. >= (<-U) [D->]
off_id_t get_prop(off_prop_t p, const off_Rights &r) const;
void set_prop(off_prop_t p, off_id_t val,
const off_Rights &shtl_r, const off_Rights &prop_r);
void dup_props(const off_shtl_id_t &s,
const off_Rights &this_r, const off_Rights &s_r);
Property definition is also available for system users. The
uShtlSrv
provides this service.
<Other public methods of off_uShtlSrv
. >= (<-U)
void define_prop(off_prtl_id_t implementor, off_prop_t id,
const off_Protection &r);
void undefine_prop(off_prop_t id, const off_Rights &r);
\subsubsection{Shuttles and Traps}
As we have seen in section 4.5, each Processor
includes
both a TrapTbl
and an IntTbl
. As in certain circumstances we
may desire to dispatch traps on a per-shuttle basis, there is a
predefined shuttle property (implemented by the shuttle trap table
server ShtlTrapTblSrv
) defining those traps in which the current
shuttle has interest.
<Off Shuttle trap table server. >= class off_ShtlTrapTblSrv : public off_CompResource { public: // Allocates a trap table. off_TrapTbl *alloc(void); // Deallocates a trap table. void free(off_TrapTbl *t); // To implement the trap table property: // Switches property values. // Returns either 0 or an error code. int pswitch(off_trapt_id_t from, off_trapt_id_t to, off_Shtl &s); // Set or clear the property at a given shuttle. int pset(off_trapt_id_t pval, off_Shtl &s, const off_Rights &r); int pclr(off_trapt_id_t pval, off_Shtl &s); // Should we call to pswitch? boolean_t needs_switch(void); };
Definesoff_ShtlTrapTblSrv
(links are to index).
Thus, to make certain traps to be ``redefined'' for certain shuttles,
a TrapTbl
should be allocated and installed as a property in the
those shuttles. Note that those traps already ``allocated'' in the
Processor
trap table will not be defined on a per-shuttle basis.
Whenever a trap occurs, the processor trap table is has priority over
the shuttle trap table (eg. Shuttles may not bypass the DMM
handling of page fault traps even if they include an entry for such a
trap in their trap tables).
When used, trap tables are considered to be part of the shuttle itself: The trap server is never seen outside the shuttle server.
Whenever an interrupt happens, we must choose either to evict the
current shuttle to dispatch the interrupt to its handler or to defer
interrupt delivering in favor of the current shuttle. To make a
choice, both shuttles and interrupts have priority-levels. On
processor interrupt, the lower priority wins (considering the executing shuttle
priority level and the interrupt priority level).
To arbitrate, no shuttle may set a priority level lower than the
lowest existing interrupt priority level. No interrupt may be adjusted
to a priority level lower than the lowest existing shuttle
interrupt-priority level. In few words, when users request a priority
level (be it for a shuttle or for an interrupt) the policy is FCFS;
e.g. If interrupt already i
has a priority level n
, no shuttle
may set its interrupt priority level to numbers below n+1
.
To implement this feature we must include priority levels as a shuttle property predefined by the shuttle server.
<Off Shuttle interrupt priority level server. >= class off_ShtlIntPrtySrv { public: // Switches property values. // Returns either 0 or an error code. int pswitch(off_pl_id_t from, off_pl_id_t to, off_Shtl &s); // Set or clear the property at a given shuttle. int pset(off_pl_id_t pval, off_Shtl &s, const off_Rights &r); int pclr(off_pl_id_t pval, off_Shtl &s); // Should we call to pswitch? boolean_t needs_switch(void); };
Definesoff_ShtlIntPrtySrv
(links are to index).
The interrupt priority is considered to be inside the shuttle. The interrupt priority server is never seen outside the shuttle server.
The trap table and the interrupt priority are considered (and seen) as
predefined properties by system users. Methods get_prop
,
set_prop
and dup_props
are well behaved with respect to them.
With respect to interrupt priorities, users may set the value of the interrupt priority level property without further operations (they don't need to allocate priority levels as they are predefined).
However, Trap tables are neither allocated nor deallocated by
users. In fact, trap tables are not even seen by users. To handle
per-shuttle trap definitions, the shuttle acts as a facade for its
users mimicing the behavior of Processor
in this respect.
<Other public methods of off_uShtl
. >+= (<-U) [<-D]
off_uTrap *alloc_trap(off_Protection *prot,
natural_t n=1, off_ev_id_t at=OFF_EU_ID_NULL);
void free_trap(off_uTrap *t, const off_Rights &r, natural_t n=1 );
off_uTrap *fetch_trap(off_ev_id_t t,const off_Rights &r) const;
% sp replaced by sp pools + final notes about access modes and stacks -Nemo
A portal is a communication endpoint. It can be attached to a handler and
also invoked like an interrupt. Portals are contained in the portal
server, an instance of the PrtlSrv
class whose main task is to allocate
portals.
<Off portal server. >= class off_PrtlSrv : public off_AbsCompResource { public: // Allocates a portal. off_Prtl *alloc(const off_Protection &p, natural_t n=1, off_prtl_id_t at=OFF_PRTL_NULL); // Deallocates a portal void free(off_Prtl *p, natural_t n=1); // Gets a portal by its number off_Prtl *operator[](off_prtl_id_t p) const; };
Definesoff_PrtlSrv
(links are to index).
Portals support both protected control transfers (PCTs) and asynchronous message delivering.
When used for PCTs, the sender shuttle changes its properties on the fly and becomes itself a flow of control in the receiver protection domain. After the handler completes, the shuttle recovers its original state and continues its execution.
Callees may, at any time, delegate the execution of PCTs to other
servers. In this case, one single shuttle can cross several protection
domains and come back directly to its
original site. To delegate a PCT reply to a different server,
pct_pass
can be used.
Finally, when no reply is ever expected, deliver
should be used
instead. It will simply save (in the callee context) what users are
unable to save safely (like eflags
on Intel architectures) and
install a simple activation frame for the handler on the receiver's
shuttle. In this case, the caller will resume execution as soon as the
callee has been notified; the message is sent but may be not delivered
yet.
The handler information present in the portal is basically a program
counter (pc
), a pointer to a pool of stack pointers (spp
), an
access mode mode
, and an (optional) set of properties (props
).
Each property value in props
will cause the sender's property to
be set to that value during portal invocation. Remaining sender
properties will maintain their original value. As it is most likely
that the address space property (see section 5.3) will be
present, that property is factored out of the property array to gain
some performance. On nodes where the kernel is compiled without support
for multiple protection domains, a null value is used the such
property is ignored. When the
props
set is not present in the
portal, it will not support PCTs but just asynchronous message
delivering.
That is enough to support PCTs, but for asynchronous message
delivering it is necessary that the portal have a shuttle identifier.
Thus, when deliver
is used, the handler (known by the portal
information) will run not on the sender shuttle but on the receiver.
If the receiver shuttle identifier is not specified, the portal will
not support asynchronous delivering.
<Off portal. >=
class off_Prtl : public AbsResUnit {
private:
<Other off_Prtl
private members. >
public:
// (Re)sets the portal handler.
void set_hndlr(vm_offset_t pc, vm_offset_t spp, off_mode_t mode,
off_dtlb_id_t vas, natural_t maxmsgsz=0,
const off_ShtlPSet *props=NULL,
off_shtl_id_t s=OFF_SHTL_NULL);
// (Locally) delivers a message to this portal using a PCT.
// (May reset the stack)
void pct(off_Shtl &sender,
void *msg, natural_t msgsz,
void *reply, natural_t replysz, natural_t tmout) const;
// Pass the PCT to a different portal.
// Reply to the initial PCT will be issued from there. (May reset the stack)
void pct_pass(off_Shtl &sender, off_prtl_id_t p) const;
// Delivers an event to a portal (one-way PCT).
// (May reset the stack)
void deliver(off_Shtl &sender,
natural_t rq_nwords, void *rq_msg, natural_t tmout) const;
};
Definesoff_Prtl
(links are to index).
Portals may be used to enforce object access control. When the handler
is attached to the portal, the permitted access mode through this
portal is specified. The kernel does not enforce it. Instead, the
access mode will be placed in a pre-specified register so that the
user handler can check the validity of the access. For example, a file
object may create a read-only access point using a portal specifying
OFF_OP_R
as the only permitted access mode. We could have omitted
this access mode
feature from portals, but it does not change the
portal semantics and avoid those calls which will fail to proceed
further.
The stack pool pointer (spp
) portal member points to an array of
initial stack pointers provided by the handler. Each incoming
invocation will use, as its own user stack, one of the stacks found
there. The pool of stacks can be of any length, but it is assumed that
the last stack pointer will be invalid and its value will be 1
.
Upon portal invocation, the kernel selects one of the non-zero
pointers found in the pool and then resets it to 0
. Since a stack
pointer with value 0
will be ignored, this is a safe mechanism to
avoid race conditions during stack selection. To allow the stack to be
reused, the user-level handler must (atomically) write the original
stack pointer value back to the stack pool.
When there are no more free stacks, any further call on the portal blocks.
These blocked shuttles are queued (using shuttle identifiers instead of
pointers, so that the list can span several nodes) using the p_wstack
private member. When a new stack is made available, they are awakened.
<Other off_Prtl
private members. >= (<-U)
off_Bureaucrat p_wstack; // No stack available on a PCT portal
\subsection{Portals for plain users}
The portal server has a ``conventional'' wrapper.
<Off portal server for users. >= ENTRY class off_uPrtlSrv : public off_uAbsCompResource { public: off_uPrtl *alloc(const off_Protection &prot, natural_t n=1, off_prtl_id_t at=OFF_PRTL_NULL); void free(off_uPrtl *p, natural_t n=1, const off_Rights &r); off_uPrtl *operator [](off_prtl_id_t id, const off_Rights &r); };
Definesoff_uPrtlSrv
(links are to index).
The portal wrapper is the most special one in the kernel. Only the non-delivering methods are exported to users as usually. Methods used for message delivering and PCT are implemented without wrapper services (because they are used to implement the portal delivering mechanism used by the wrappers).
<Off portal for users. >= ENTRY class off_uPrtl : public uAbsResUnit { public: // (Re)sets the portal handler. void set_hndlr(vm_offset_t pc, vm_offset_t spp, off_mode_t mode, off_dtlb_id_t vas, natural_t maxmsgsz=0, const off_ShtlPSet *props=NULL, off_shtl_id_t s=OFF_SHTL_NULL, const off_Rights &prtl_r, const off_Rights &shtl_r, const off_Rights &dtlb_r); };
Definesoff_uPrtl
(links are to index).
Off++ memory management is based on Distributed TLBs (DTLBs) where TLB stands for Translation Lookaside Buffer, a cache of virtual to physical memory address translations [12].
Each Distributed Memory Manager (DMM, for short) is actually a pool of DTLBs. However, DTLBs are an optional feature. Machines dedicated to a single application (like a dedicated file server, an embedded controller, etc.) can use a single protection domain for efficiency purposes.
A DMM multiplexes the address translation hardware among the existing DTLBs in cooperation with the shuttle server (as there is usually one processor register used to identify the protection domain).
<Off DMM. >=
class off_DMM : public off_AbsCompResource {
public:
// Allocates a DTLB.
off_DTLB *alloc(const off_Protection &prot,
natural_t n=1, off_dtlb_id_t at=OFF_DTLB_NULL);
// Deallocates a DTLB.
void free(off_DTLB *DTLB, natural_t n=1);
//Gets a DTLB from its number.
off_DTLB *operator [](off_dtlb_id_t id);
//Gets the size of pages being translated.
vm_size_t get_pgsize(void);
<Off off_DMM
shuttle property methods. >
};
Definesoff_DMM
(links are to index).
As \dtlb{}s are valid shuttle property values, the \dmm{} also
implements the SthlPropSrv
interface.
<Off off_DMM
shuttle property methods. >= (<-U)
// Property interface routines (off_ShtlPropSrv signature).
// Switches property values.
// Returns either 0 or an error code.
int pswitch(off_dtlb_id_t from, off_dtlb_id_t to, off_Shtl &s);
// Set or clear the property at a given shuttle.
int pset(off_dtlb_id_t pval, off_Shtl &s, const off_Rights &r);
int pclr(off_dtlb_id_t pval, off_Shtl &s);
// Should we call to pswitch?
boolean_t needs_switch(void);
A DTLB is a set of translations. Not every existing translation must be present in it. The \dtlb{} should be considered to be a cache of the translations being used. However, on architectures with page tables, the cache may actually hold a copy of every existing translation to local memory.
Page faults and remaining events for the DTLB will be delivered to
its owner (see off_Resource
, for details).
<Off DTLB. >= class off_DTLB : public AbsResUnit { public: // Installs a set of (contiguous) address translations. void install(vm_offset_t va, off_pg_id_t pa, off_mode_t access_mode, natural_t n=1); // Deinstalls a set of (contiguous) address translations. void invalidate(vm_offset_t va, natural_t n=1); // Changes the access mode bits for the given translations void set_mode(vm_offset_t va, off_mode_t access_mode, natural_t n=1); };
Definesoff_DTLB
(links are to index).
Note that these addresses can refer also to remote memory. Remote
translations will make intensive use of the \dmm{}'s relocation table
(one of such tables is present in every AbsCompResource
object, as
we saw in section 2.5).
Users can handle their DTLBs through the DTLB and DMM wrappers
<Off DMM for users. >= ENTRY class off_uDMM : public off_uAbsCompResource { public: // Allocates a DTLB. off_uDTLB *alloc(const off_Protection &prot, natural_t n=1, off_dtlb_id_t at=OFF_DTLB_NULL); // Deallocates a DTLB. void free(off_uDTLB *DTLB, natural_t n=1, const off_Rights &r); //Gets a DTLB from its number. off_uDTLB *operator [](off_dtlb_id_t id, const off_Rights &r); //Gets the size of pages being translated. vm_size_t get_pgsize(const off_Rights &r,); };
Definesoff_uDMM
(links are to index).
<Off DTLB for users. >= ENTRY class off_uDTLB : public off_uAbsResUnit { public: // Installs a set of (contiguous and w/ the same access rights) // address translations. void install(vm_offset_t va, off_pg_id_t pa, off_mode_t access_mode, natural_t n=1, const off_Rights &dtlb_r, const off_Rights &pa_r ); // Deinstalls a set of (contiguous) address translations. void invalidate(vm_offset_t va, natural_t n=1, const off_Rights &r); // Changes the access mode bits for the given translations void set_mode(vm_offset_t va, off_mode_t access_mode, natural_t n=1, const off_Rights &dtlb_r, const off_Rights &pa_r); };
Definesoff_uDTLB
(links are to index).
off_DMM
shuttle property methods. >: U1, D2
off_Shtl
property accessors. >: U1, D2, D3, D4
off_Shtl
register accessors. >: D1, U2
off_uShtl
property accessors. >: U1
off_Node
private members. >: U1
off_Node
protected methods. >: U1, D2, D3
off_Node
public methods. >: U1, D2
off_Prtl
private members. >: U1, D2
off_Shtl
private members. >: U1, D2
off_CompResource
. >: U1
off_Processor
. >: U1, D2
off_Resource
. >: U1, D2
off_Irq
. >: U1, D2
off_Processor
. >: U1, D2
off_Resource
. >: U1, D2
off_Shtl
. >: U1, D2
off_ShtlSrv
. >: U1, D2
off_uResource
. >: D1, U2, D3
off_uShtl
. >: U1, D2, D3
off_uShtlSrv
. >: U1, D2
off_u
Resource
off_
we will omit it in this text.
_
''). Those members which may be in
more than one class -like the reference counter- will be just
prefixed by an underscore.
A Detailed Description of Off++, a Distributed Adaptable kernel
This document was generated using the LaTeX2HTML translator Version 96.1-h (September 30, 1996) Copyright © 1993, 1994, 1995, 1996, Nikos Drakos, Computer Based Learning Unit, University of Leeds.
The command line arguments were:
latex2html -show_section_numbers -split 0 off-full.
The translation was initiated by Francisco J. Ballesteros on Sat Dec 27 22:04:31 MET 1997
Francisco J. Ballesteros