parallelheat.tex 11.4 KB
 Thomas Witkowski committed Oct 23, 2008 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 \section{Parallelization} \begin{figure} \center \begin{tabular}{cccc} \includegraphics[width=0.21\textwidth]{fig/macro.pdf} & \includegraphics[width=0.21\textwidth]{fig/local1.pdf} & \includegraphics[width=0.21\textwidth]{fig/global.pdf} & \includegraphics[width=0.21\textwidth]{fig/local2.pdf} \\ (a)&(b)&(c)&(d)\\ \end{tabular} \caption{(a) A triangular macro mesh, (b) domain decomposition after six global refinements and overlap for partition 0, (c) composite mesh after adaptation loop, (d) local mesh of process 0 after adaptation loop} \label{f:full domain covering meshes2} \end{figure} Before we start with tha application example, we give a short overview over the parallelization approach of full domain covering meshes used in AMDiS. The approach can be summerized by the following steps: \begin{itemize} \item Create a partitioning level by some adaptive or global refinements of the macro mesh. \item Create a partitioning with approximately the same number of elements in each partition. \item Every process computes on the whole domain, but adaptive mesh refinements are only allowed within the local partition including some overlap. \item After each adaptive iteration, a new partitioning can be computed if the parallel load balance becomes to bad. Partitionings are always computed at the same mesh level. The partitioning elements are weighted by the number of corresponding leaf elements (elements with no children). \item At the end of each timestep or at the end of parallel computations, a global solution is constructed by a partititon of unity method (weighted sum over all process solutions). \end{itemize} Figure \ref{f:full domain covering meshes2} illustrates the concept of full domain covering meshes. The so called three level approach allows to decouple the following levels: \begin{enumerate} \item Partitioning level: Basis for the partitioning. The partitioning level is built by the current leaf elements of the mesh, when the parallelization is initialized. \item Global coarse grid level: Number of global refinements starting from the partitioning level for the whole domain. The global coarse grid level builds the coarsest level for all future computations. \item Local coarse grid level: Number of global refinements starting from the partitioning level within the local domain including neighbor elements. The local coarse grid level reduces the size of partition overlap. \end{enumerate} In Figure \ref{f:threelevels2}, an example for the three level approach is given. The dashed line shows the overlap of size 1 for the local partition computed at partitining level. The parallelization in AMDiS uses the {\it Message Passing Interface} MPI, see \begin{lstlisting}{} http://www-unix.mcs.anl.gov/mpi/ . \end{lstlisting} For the partitioning the parallel graph partitioning library ParMETIS is used, see \begin{lstlisting}{} http://glaros.dtc.umn.edu/gkhome/metis/parmetis/overview . \end{lstlisting} To use AMDiS in parallel, it must be configured like \begin{lstlisting}{} > ./configure --prefix= --with-mpi= --with-parmetis= . \end{lstlisting} The parallel application (in this example \verb+parallelheat+) must be compiled with the MPI C++ compiler \verb+mpiCC+ and linked against the ParMETIS library. Then the application can be called with \verb+mpirun+: \begin{lstlisting}{} > mpirun -np ./parallelheat init/parallelheat.dat.2d \end{lstlisting} Now we start with the example. We want to solve a time dependent problem as described in \ref{s:heat}. As right hand side function $f$ we choose the {\it moving source} f(x,t)=\mbox{sin}(\pi t) e^{-10(x-\vec{t})^2} with $0 \leq t \leq 1$ and $\vec{t}$ a vector with all components set to $t$. The maximum of this function moves from $(0,0)$ to $(1,1)$ within the specified time intervall. We use 4 parallel processes. \begin{figure} \center \includegraphics[width=0.95\textwidth]{fig/threelevels.pdf} \caption{Example for the three levels of partitioning. The partitioning mesh is globally refined twice to get the global coarse grid level. Then two further global refinement steps applied on $\Omega_i$ and its direct neighbor elements (in each step) result in the local coarse grid level.} \label{f:threelevels2} \end{figure} \begin{table} \center \begin{tabular}{|cc|c|c|c|} \hline && {\bf 1d} & {\bf 2d} & {\bf 3d} \\ \hline {\bf source code} & {\tt src/} & \multicolumn{3}{|c|}{\tt parallelheat.cc} \\ \hline {\bf parameter file} & {\tt init/} & \tt - & \tt parallelheat.dat.2d & \tt - \\ \hline {\bf macro file} & {\tt macro/} & \tt - & \tt macro.stand.2d & \tt - \\ \hline {\bf output files} & {\tt output/} & \multicolumn{3}{|c|}{\tt parallelheat\_proc\_.mesh/.dat} \\ \hline \end{tabular} \caption{Files of the {\tt parallelheat} example. In the output file names, {\tt } is replaced by the process number and {\tt } is replaced by the time.} \label{t:parallelheat} \end{table} \subsection{Source code} In this section the interesting aspects of file \verb+parallelheat.cc+ are described. First, the moving functions $f$ and $g$ are defined. \begin{lstlisting}{} class G : public AbstractFunction >, public TimedObject { public:  Thomas Witkowski committed May 26, 2009 102  /// Implementation of AbstractFunction::operator().  Thomas Witkowski committed Oct 23, 2008 103 104 105 106  double operator()(const WorldVector& argX) const { WorldVector x = argX; int dim = x.getSize();  Thomas Witkowski committed May 26, 2009 107  for (int i = 0; i < dim; i++)  Thomas Witkowski committed Oct 23, 2008 108  x[i] -= *timePtr;  Thomas Witkowski committed May 26, 2009 109   Thomas Witkowski committed Oct 23, 2008 110 111 112 113 114 115 116 117  return sin(M_PI * (*timePtr)) * exp(-10.0 * (x * x)); } }; class F : public AbstractFunction >, public TimedObject { public:  Thomas Witkowski committed May 26, 2009 118  F(int degree) : AbstractFunction >(degree) {}  Thomas Witkowski committed Oct 23, 2008 119   Thomas Witkowski committed May 26, 2009 120 121 122  /// Implementation of AbstractFunction::operator(). double operator()(const WorldVector& argX) const {  Thomas Witkowski committed Oct 23, 2008 123 124  WorldVector x = argX; int dim = x.getSize();  Thomas Witkowski committed May 26, 2009 125  for (int i = 0; i < dim; i++)  Thomas Witkowski committed Oct 23, 2008 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149  x[i] -= *timePtr; double r2 = (x * x); double ux = sin(M_PI * (*timePtr)) * exp(-10.0 * r2); double ut = M_PI * cos(M_PI*(*timePtr)) * exp(-10.0 * r2); return ut - (400.0 * r2 - 20.0 * dim) * ux; } }; \end{lstlisting} The main program starts with \verb+MPI::Init+ to initialize MPI, and ends with \verb+MPI::Finalize+ to finalize MPI. \begin{lstlisting}{} int main(int argc, char** argv) { MPI::Init(argc, argv); ... std::vector*> vectors; ParallelProblemScal parallelheat("heat->parallel", heatSpace, heat, vectors); AdaptInstationary *adaptInstat =  Thomas Witkowski committed May 26, 2009 150  new AdaptInstationary("heat->adapt",  Thomas Witkowski committed Oct 23, 2008 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241  ¶llelheat, adaptInfo, ¶llelheat, adaptInfoInitial); ... parallelheat.initParallelization(adaptInfo); adaptInstat->adapt(); parallelheat.exitParallelization(adaptInfo); MPI::Finalize(); } \end{lstlisting} The parallel problem \verb+parallelheat+ has to know the sequential instationary problem \verb+heat+ and the space problem \verb+heatSpace+. The vector \verb+vectors+ stores pointers to DOF vectors which are used by the operator terms. The values of these vectors have to be exchanged during repartitioning. The solution of the last time step is considered automatically. Hence, \verb+vectors+ here is empty. The adaptation loop \verb+adaptInstat+ uses \verb+parallelheat+ as iteration interface and as time interface. Before the adaptation loop starts, the parallelization is initialized by \verb+initParallelization+. After the adaptation loop has finished, the parallelization is finalized by \verb+exitParallelization+. \subsection{Parameter file} We use the parameter file described in Section \ref{s:heat parameter} as basis for the file \verb+parallelheat.dat.2d+ and describe the relevant changes for the parallel case. \begin{lstlisting}{} heatMesh->global refinements: 6 \end{lstlisting} The macro mesh is globally refined 6 times to create the partitioning mesh which is used as basis for the domain decomposition. In the following, one can see the parameters for the parallel problem \verb+heat->parallel+. \begin{lstlisting}{} heat->parallel->upper part threshold: 1.5 heat->parallel->lower part threshold: 0.66 \end{lstlisting} These two parameters determine the upper and lower partitioning thresholds $rt_{high}$ and $rt_{low}$. If at least one process exceeds the number of $rt_{high} \cdot sum_{av}$ elements or falls below $rt_{low} \cdot sum_{av}$ elements, a repartitioning must be done ($sum_{av}$ is the average number of elements per partition). %If adaptive thresholds are used, these values are also the limits for the threshold adaptation. The start value for the upper threshold is also the minimum for the upper threshold, the start value for the lower threshold is also the maximum for the lower threshold. %\begin{lstlisting}{} %heat->parallel->adaptive thresholds: 1 %heat->parallel->threshold inc factor: 2.0 %heat->parallel->threshold dec factor: 0.5 %heat->parallel->repart time factor: 10.0 %\end{lstlisting} %If \verb+adaptive thresholds+ is set to $1$, the \verb+threshold inc factor+ and \verb+threshold dec factor+ determine the incremenent and decrement factors for the thresholds. The value of \verb+repart time factor+ is the desired relation between repartitioning time and computation time. \begin{lstlisting}{} heat->parallel->global coarse grid level: 0 heat->parallel->local coarse grid level: 4 \end{lstlisting} Here, the parameters used for the three level approach are given. The partitioning level has already been determined by \verb+heatMesh->global refinements+. The value of \verb+global coarse grid level+ is set to $0$, and \verb+local coarse grid level+ is set to $4$. This means that 4 global refinements are done within the local partititon at each process (incl. neighoring elements), to reduce the overlap size. No further refinements are done outside of the local partitions. \begin{lstlisting}{} heat->adapt->timestep: 0.1 heat->adapt->min timestep: 0.1 heat->adapt->max timestep: 0.1 heat->adapt->start time: 0.0 heat->adapt->end time: 1.0 \end{lstlisting} We use a fixed timestep of $0.1$. The start time is set to $0.0$, the end time is set to $1.0$. Thus, we have 10 timesteps (without the initial solution at $0.0$). \begin{lstlisting}{} heat->space->output->filename: output/heat \end{lstlisting} For the output the prefix \verb+output/heat+ is used. Each process adds its process ID. Then the time is added and finally the file extensions. The result of the initial problem of process 1 will be written to \verb+output/heat_proc0_00.000.mesh+ and \verb+output/heat_proc0_00.000.dat+. \subsection{Macro file} We again use the macro file \verb+macro/macro.stand.2d+, which was described in Section \ref{s:ellipt macro}. \subsection{Output} For each timestep and for each process, one mesh file and one value file is written (total file number: $11 \cdot 4 \cdot 2 = 88$). In Figure \ref{f:parallel heat}, the local process solutions for $t=0.1$, $t=0.5$ and $t=0.9$ are visualized after the partititon of unity was applied. One can see that the partitioning considers the moving source. \begin{figure} \center \begin{tabular}{ccc} \includegraphics[width=0.3\textwidth]{fig/parallelheat_01.jpg}& \includegraphics[width=0.3\textwidth]{fig/parallelheat_05.jpg}& \includegraphics[width=0.3\textwidth]{fig/parallelheat_09.jpg}\\ (a)&(b)&(c) \end{tabular} \caption{Local process solutions for $t=0.1$ (a), $t=0.5$ (b), and $t=0.9$ (c).} \label{f:parallel heat} \end{figure}